Adaptive syntax grouping and compression in video data

ABSTRACT

An encoding system may include a video source that captures video image, a video coder, and a controller to manage operation of the system. The video coder may encode the video image into encoded video data using a plurality of subgroup parameters corresponding to a plurality of subgroups of pixels within a group. The controller may set the subgroup parameters for at least one of the subgroups of pixels in the video coder, based upon at least one parameters corresponding to the group. A decoding system may decode the video data based upon the motion prediction parameters.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.14/289,082, filed May 28, 2014, now allowed, the disclosure of which ishereby incorporated by reference in its entirety.

BACKGROUND

In video recording, video coding may include hierarchical picturepartition techniques, such as quad-tree based partitioning, which mayresult in better adaptation to video content. In such techniques, animage frame may be divided into many non-overlapping largest codingunits (LCU's). Each LCU may be further partitioned into smaller codingunits (CU's) in a quad-tree manner (each unit is divided into foursmaller units). The video coder may determine the quad-tree structure ofan LCU. Inside a LCU, each CU may have its own CU level syntaxvariables, such as skip flag, predMode, partMode, etc. The CU's areencoded one by one with their own syntax variables. For the purpose ofdiscussion below, a LCU may be considered a group of pixels, whichincludes a plurality of CU's as subgroups of pixels.

Often the video content or some specific video parameters may not varysignificantly from CU to CU inside a LCU (or between some subgroups ofpixels within a specific group). The CU's may share similar encodingmodes and syntax, which means there could be some redundancy among thesyntax. Context-adaptive binary arithmetic coding (CABAC), a form ofentropy encoding, and some data compression processes may remove some ofthe redundancy, but may not be optimal.

Thus, there is a need to reduce the redundancy by adaptively groupingthe syntax in encoded video data.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a communication system according to an embodiment ofthe present disclosure.

FIG. 2 illustrates a decoding system according to an embodiment of thepresent disclosure.

FIG. 3 illustrates a coding system according to an embodiment of thepresent disclosure.

FIG. 4 illustrates a decoding method according to an embodiment of thepresent disclosure.

FIG. 5 illustrates a coding method according to an embodiment of thepresent disclosure.

DETAILED DESCRIPTION

FIG. 1 illustrates a simplified block diagram of a communication system100 according to an embodiment of the present invention. The system 100may include at least two terminals 110-120 interconnected via a network150. For unidirectional transmission of data, a first terminal 110 maycode video data at a local location for transmission to the otherterminal 120 via the network 150. The second terminal 120 may receivethe coded video data of the other terminal from the network 150, decodethe coded data and display the recovered video data. Unidirectional datatransmission may be common in media serving applications and the like.

FIG. 1 illustrates a second pair of terminals 130, 140 provided tosupport bidirectional transmission of coded video that may occur, forexample, during videoconferencing. For bidirectional transmission ofdata, each terminal 130, 140 may code video data captured at a locallocation for transmission to the other terminal via the network 150.Each terminal 130, 140 also may receive the coded video data transmittedby the other terminal, may decode the coded data and may display therecovered video data at a local display device.

In FIG. 1, the terminals 110-140 may be illustrated as servers, personalcomputers and smart phones but the principles of the present inventionmay be not so limited. Embodiments of the present invention findapplication with laptop computers, tablet computers, media playersand/or dedicated video conferencing equipment. The network 150represents any number of networks that convey coded video data among theterminals 110-140, including for example wireline and/or wirelesscommunication networks. The communication network 150 may exchange datain circuit-switched and/or packet-switched channels. Representativenetworks include telecommunications networks, local area networks, widearea networks and/or the Internet. For the purposes of the presentdiscussion, the architecture and topology of the network 150 may beimmaterial to the operation of the present invention unless explainedherein below.

FIG. 2 may be a functional block diagram of a video decoding system 200according to an embodiment of the present invention.

The video decoding system 200 may include a receiver 210 that receivesencoded video data, a video decoder 220, a controller 228 to manageoperation of the system 200 and a display 234 to display the decodedvideo data. The video decoder 220 may decode video sequence received.The controller 228 may set the subgroup parameters for at least one ofthe subgroups of pixels in the video decoder, based upon at least oneparameters corresponding to the group.

The parameters and the subgroup parameters in the encoded video mayinclude parameters for groups of pixels, where each group includes aplurality of subgroups of pixels. Additional details of the parametersand the subgroup parameters will be described below.

The receiver 210 may receive video to be decoded by the system 200. Theencoded video data may be received from a channel 212, which may be ahardware/software link to a storage device which stores the encodedvideo data. The receiver 210 may receive the encoded video data withother data, for example, coded audio data and/or ancillary data streams.The receiver 210 may separate the encoded video data from the otherdata.

The video decoder 220 may perform decoding operation on the videosequence received from the receiver 210. The video decoder 220 mayinclude a video decoder 222, a reference picture cache 224, and aprediction mode selection 226 operating under control of controller 228.The video decoder 222 may reconstruct coded video data received from thereceiver 210 with reference to reference pictures stored in thereference picture cache 224. The video decoder 222 may outputreconstructed video data to display 234 for display. Reconstructed videodata of reference frames also may be stored to the reference picturecache 224 for use during decoding of subsequently received coded videodata.

The video decoder 222 may perform decoding operations that invert codingoperations performed by the video coder 330 (shown in FIG. 3). The videodecoder 222 may perform entropy decoding, dequantization, transformdecoding, and filtering to generate recovered pixel block data.Quantization/dequantization operations may be lossy processes and,therefore, the recovered pixel block data likely will be a replica ofthe source pixel blocks that were coded by the video coder 330 (shown inFIG. 3) but may include some error. For pixel blocks coded predictively,the transform decoding may generate residual data; the video decoder 222may use motion vectors associated with the pixel blocks to retrievepredicted pixel blocks from the reference picture cache 224 to becombined with the prediction residuals. The prediction mode selector 226may identify a temporal prediction mode being used for each pixel blockof an encoded frame being decoded and request the needed data for thedecoding to be read from the reference picture cache 224.

The video decoder 220 may perform decoding operations according to apredetermined protocol, such as H.263, H.264, MPEG-2, HEVC. In itsoperation, the video decoder 220 may perform various decodingoperations, including predictive decoding operations that exploittemporal and spatial redundancies in the encoded video sequence. Thecoded video data, therefore, may conform to a syntax specified by theprotocol being used.

The parameters may be received as part of the syntax specified by theprotocol in the coded video data, or appended as ancillary portion ofthe coded video data, to allow for backward compatibility.

In an embodiment, the receiver 210 may receive additional data with theencoded video.

The additional data may be included as part of the encoded video frames.The additional data may be used by the video decoder 220 to properlydecode the data and/or to more accurately reconstruct the original videodata.

FIG. 3 may be a functional block diagram of a video coding system 300according to an embodiment of the present invention.

The system 300 may include a video source 310 that captures video imageto be coded by the system 300, a video coder 330, a transmitter 340, anda controller 350 to manage operation of the system 300. The video coder330 may encode the video image into encoded video data using a pluralityof subgroup parameters corresponding to a plurality of subgroups ofpixels within a group. The controller 350 may set the subgroupparameters for at least one of the subgroups of pixels in the videocoder, based upon at least one parameters corresponding to the group.The transmitter 340 may transmit the video data.

The video source 310 may provide video to be coded by the system 300. Ina media serving system, the video source 310 may be a storage devicestoring previously prepared video. In a videoconferencing system, thevideo source 310 may be a camera that captures local image informationas a video sequence. Video data typically may be provided as a pluralityof individual frames that impart motion when viewed in sequence. Theframes themselves typically may be organized as a spatial array ofpixels.

According to an embodiment, the system 300 may code and compress theimage information for frames of the video sequence in real time, basedupon one or more parameters. The controller 350 may control thecompression and coding in video coder 330, based on the parameters.

As part of its operation, the video coder 330 may perform motioncompensated predictive coding, which codes an input frame predictivelywith reference to one or more previously-coded frames from the videosequence that were designated as “reference frames.” In this manner, thecoding engine 332 codes differences between pixel blocks of an inputframe and pixel blocks of reference frame(s) that may be selected asprediction reference(s) to the input frame.

The local video decoder 333 may decode coded video data of frames thatmay be designated as reference frames. Operations of the coding engine332 typically may be lossy processes. When the coded video data may bedecoded at a video decoder (not shown in FIG. 3), the recovered videosequence typically may be a replica of the source video sequence withsome errors. The local video decoder 333 replicates decoding processesthat will be performed by the video decoder on reference frames and maycause reconstructed reference frames to be stored in the referencepicture cache 334. In this manner, the system 300 may store copies ofreconstructed reference frames locally that have common content as thereconstructed reference frames that will be obtained by a far-end videodecoder (absent transmission errors).

The predictor 335 may perform prediction searches for the coding engine332. That is, for a new frame to be coded, the predictor 335 may searchthe reference picture cache 334 for image data (as candidate referencepixel blocks) that may serve as an appropriate prediction reference forthe new frames. The predictor 335 may operate on a pixel block-by-pixelblock basis to find appropriate prediction references. In some cases, asdetermined by search results obtained by the predictor 335, an inputframe may have prediction references drawn from multiple frames storedin the reference picture cache 334.

The controller 350 may manage coding operations of the video coder 330,including, for example, setting of parameters and subgroup parametersused for encoding the video data.

The transmitter 340 may buffer coded video data to prepare it fortransmission via a communication channel 360, which may be ahardware/software link to a storage device which would store the encodedvideo data. The transmitter 340 may merge coded video data from thevideo coder 330 with other data to be transmitted, for example, codedaudio data and/or ancillary data streams (sources not shown).

The controller 350 may manage operation of the system 300. Duringcoding, the controller 350 may assign to each frame a certain frame type(either of its own accord or in cooperation with the controller 350),which may affect the coding techniques that may be applied to therespective frame. For example, frames often may be assigned as one ofthe following frame types:

An Intra Frame (I frame) may be one that may be coded and decodedwithout using any other frame in the sequence as a source of prediction.

A Predictive Frame (P frame) may be one that may be coded and decodedusing intra prediction or inter prediction using at most one motionvector and reference index to predict the sample values of each block.

A Bi-directionally Predictive Frame (B frame) may be one that may becoded and decoded using intra prediction or inter prediction using atmost

two motion vectors and reference indices to predict the sample values ofeach block.

Frames commonly may be parsed spatially into a plurality of pixel blocks(for example, blocks of 4×4, 8×8 or 16×16 pixels each) and coded on apixel block-by-pixel block basis. Pixel blocks may be coded predictivelywith reference to other coded pixel blocks as determined by the codingassignment applied to the pixel blocks' respective frames. For example,pixel blocks of I frames may be coded non-predictively or they may becoded predictively with reference to pixel blocks of the same frame(spatial prediction). Pixel blocks of P frames may be codednon-predictively, via spatial prediction or via temporal prediction withreference to one previously coded reference frame. Pixel blocks of Bframes may be coded non-predictively, via spatial prediction or viatemporal prediction with reference to one or two previously codedreference frames.

The video coder 330 may perform coding operations according to apredetermined protocol, such as H.263, H.264, MPEG-2, HEVC. In itsoperation, the video coder 330 may perform various compressionoperations, including predictive coding operations that exploit temporaland spatial redundancies in the input video sequence. The coded videodata, therefore, may conform to a syntax specified by the protocol beingused.

In an embodiment, the transmitter 340 may transmit additional data withthe encoded video. The video coder 330 may include such data as part ofthe encoded video frames.

In an embodiment according the invention, syntax of video data may begrouped and set by the controller 350 in video coder 330 to reduceredundancy.

The syntax predMode is used as an illustrative example here. Similarmethods of implementation may be used for other video data syntax.PredMode of a CU is a flag that indicates whether a CU is encoded withinter mode or intra mode. Each non-skip CU may expect to be encoded witha predMode value to indicate whether the CU will be predicted by interor intra mode, and then related syntax may provide additional data forthe prediction of the CU. In many cases, there is no intra mode insideany of the CU's in an entire LCU. In such a case, the predMode syntaxfor all the CU's in a LCU may be compressed by signaling a flag at theLCU level to indicate that it is an all inter-mode LCU. Then there wouldbe no need to encode the predMode syntax for each individual CU inside aspecific LCU.

In an example coding unit semantics, “all_inter_flag” may be coded for aLCU (or a group of pixels with multiple sub-groups). “all_inter_flag”=1specifies that the current LCU includes CU's all predicted in intermode, which means individual CU's in the current LCU no longer needtheir own predMode syntax. “all_inter_flag”=0 specifies that the currentLCU includes CU's predicted in either intra or inter modes. If“all_inter_flag”=0, then individual CU's inside the current LCU may befurther coded with their own predMode syntax. See the semantics below:

Descriptor coding_tree_unit( ) { xCtb = ( CtbAddrInRs % PicWidthInCtbsY) << CtbLog2SizeY yCtb = ( CtbAddrInRs / PicWidthInCtbsY ) <<CtbLog2SizeY ... all_inter_flag[ x0 ][ y0 ] ae(v) coding_quadtree( xCtb,yCtb, CtbLog2SizeY, 0, all_inter_flag ) . . .

Descriptor coding_quadtree( x0, y0, log2CbSize, cqtDepth ,all_inter_flag) { . . . Coding_unit(x0, y0, log2CbSize, all_inter_flag) . . .

Descriptor coding_unit( x0, y0, log2CbSize, all_inter_flag ) { . . . if(slice_type != I & !all_inter_flag ) pred_mode_flag ae(v) . . .

Thus, significant number of bits may be saved in the encoded video data,which would reduce bandwidth during video data transmission and reducestorage space for the video file.

Similarly, other encoding syntax and parameters may have redundancyreduced by similar implementation, saving even more bits. For example,Partition mode, skip mode, SAO parameters, residue flags, transformsplit, transform depth, etc. Additionally, more than one syntax can begrouped together with sharing the same signaling information at LCUlevel. For example, predMode and partMode can be grouped together.

Additionally, while the above example illustrates the case in codingCU's and LCU's, other groupings of pixels comprising subgroups of pixelsmay implement similar methods in their syntax to reduce redundancy. Forexample, slice headers, SPS, PPS, region of interest (ROI), etc.

The above example illustrates a syntax grouping that uses a “signaleddefault signaled exception” method. That is, the syntax signals adefault setting of 1 for “all_inter_flag” for CU's within a LCU, andalso signals an exception setting where CU's for a particular LCU arenot all inter mode. However, many other possible methods for syntaxgrouping are possible.

In an embodiment, a syntax grouping may use an “assumed default signaledexception” method. In this method, for example, the syntax may assumethat all CU's are by default in inter 2N×2N partition mode, and onlysignals partMode syntax for a CU if the CU is predicted in otherpartition modes (2N×N, N×2N, or N×N). By this way, the syntax at LCU(e.g. all 2N×2N flag) does not even need to be used, saving even morebits.

In an embodiment, a syntax grouping may use a “signaled baselinesignaled change” method, for more complex variables such as a numericparameter (for example color values, coefficients). A numeric parameterin coding may have some baseline or average value from which individualCU's may deviate only slightly from. In such a case, the LCU may besignaled by syntax for the baseline value of that numeric parameter, andindividual CU's may be signaled by syntax for amount of deviation fromthe baseline value. If the deviations are small enough, the signaleddeviation value may be coded with very few bits, saving significantnumber of bits in the encoded video data.

In an embodiment, a syntax grouping may use a “signaled change holduntil next change” method, for parameters that change infrequently. Insuch cases, the encoded video may start with a CU (a subgroup) that hassyntax of an initial value for a parameter coded as a deviation fromzero value. Then, the parameter would not need to be coded again until aCU (a subgroup) has a parameter value that deviates from the previousvalue. In other words, the video coder and the video decoder wouldassume that the parameter value is held from CU to CU, unless theparameter syntax is coded otherwise for a particular CU. Once theparameter value changes for a particular CU, that new parameter valueholds until the next change.

As illustrated above in the various embodiments of methods for syntaxgrouping, the methods can be enabled or disabled by syntax at higherlevels (ex. SPS, PPS, VPS, and/or slice header), by for example,including in the syntax of the higher levels, a control parameter thatindicates whether all of the plurality of the subgroup parameters withinthe group are encoded in the encoded video data, or a control parameterthat indicates a specific mode of compressing the plurality of thesubgroup parameters within the group in the encoded video data. Thecontrol parameter may correspond to controlling one group or a pluralityof groups of pixels at the lower levels.

As illustrated above in the various embodiments of methods for syntaxgrouping, the different syntax grouping methods may have different datasize efficiency depending on the nature of the various subgroupparameters intended for redundancy reduction. Thus, it may be necessaryfor the controller 350 to analyze the video data before encoding todetermine and/or select an optimal method of syntax grouping for aspecific set of subgroup parameters in a group. Additionally, themethods of syntax grouping may vary from one group of pixel to the nextgroup of pixel. The video coder 330 may encode additional syntaxinformation to signal to the video decoder which method of syntaxgrouping is used to code a specific group of pixels.

While the above examples illustrate scenarios of implementations for agroup of pixels containing contiguous subgroups of pixels, the embodiedsyntax grouping may be implemented for non-contiguous groups of pixelsor a group of non-contiguous subgroups of pixels.

FIG. 4 illustrates a decoding method 400 according to an embodiment ofthe present disclosure.

At block 410, the system 200 may parse the syntax of the encoded videodata.

At block 420, the controller 228 may determine, from the parsed syntax,parameters and set subgroup parameters to be used for decoding. Somesubgroup parameters may be absent for specific subgroups and thecontroller 228 may determine these absent subgroup parameters bycalculations based on other subgroup parameters and parameters of groupthe group, in accordance with specific syntax grouping methods.

At block 430, the controller 228 may control the video decoder 220 todecode video data using the set subgroup parameters. That is, somesubgroups may not include corresponding subgroup parameters of specifictypes in the encoded video data. The absent subgroup parameters may bederived from other subgroup parameters and/or parameters of the group bythe video decoder 220 or the controller 228 in accordance to the syntaxgrouping methods.

FIG. 5 illustrates a coding method 500 according to an embodiment of thepresent disclosure.

At block 510, the system 300 may analyze the video image from the videosource 310 for subgroup parameters for encoding of the subgroups ofpixels.

At block 520, the controller 350 may determine if any subgroupparameters may be reduced for redundancy in encoding, according topredetermined syntax grouping methods. The controller 350 may alsoselect the optimal syntax grouping method for specific types of subgroupparameters based upon the analysis of the subgroup parameters. Thecontroller 350 may also set parameters for the group as part of theencoding of the group. The parameters for the group may or may not beincluded in the encoded video data, and may be derived from some of thesubgroup parameters.

At block 530, the video coder 330 may encode video data using thereduced redundancy subgroup parameters. That is, some subgroups may notinclude corresponding subgroup parameters of specific types in theencoded video data. The absent subgroup parameters may be derived laterfrom other subgroup parameters and/or parameters of the group by a videodecoder in accordance to the syntax grouping methods.

It is appreciated that the disclosure is not limited to the describedembodiments, and that any number of scenarios and embodiments in whichconflicting appointments exist may be resolved.

Although the disclosure has been described with reference to severalexemplary embodiments, it is understood that the words that have beenused are words of description and illustration, rather than words oflimitation. Changes may be made within the purview of the appendedclaims, as presently stated and as amended, without departing from thescope and spirit of the disclosure in its aspects. Although thedisclosure has been described with reference to particular means,materials and embodiments, the disclosure is not intended to be limitedto the particulars disclosed; rather the disclosure extends to allfunctionally equivalent structures, methods, and uses such as are withinthe scope of the appended claims.

While the computer-readable medium may be described as a single medium,the term “computer-readable medium” includes a single medium or multiplemedia, such as a centralized or distributed database, and/or associatedcaches and servers that store one or more sets of instructions. The term“computer-readable medium” shall also include any medium that is capableof storing, encoding or carrying a set of instructions for execution bya processor or that cause a computer system to perform any one or moreof the embodiments disclosed herein.

The computer-readable medium may comprise a non-transitorycomputer-readable medium or media and/or comprise a transitorycomputer-readable medium or media. In a particular non-limiting,exemplary embodiment, the computer-readable medium can include asolid-state memory such as a memory card or other package that housesone or more non-volatile read-only memories. Further, thecomputer-readable medium can be a random access memory or other volatilere-writable memory. Additionally, the computer-readable medium caninclude a magneto-optical or optical medium, such as a disk or tapes orother storage device to capture carrier wave signals such as a signalcommunicated over a transmission medium. Accordingly, the disclosure isconsidered to include any computer-readable medium or other equivalentsand successor media, in which data or instructions may be stored.

Although the present application describes specific embodiments whichmay be implemented as code segments in computer-readable media, it is tobe understood that dedicated hardware implementations, such asapplication specific integrated circuits, programmable logic arrays andother hardware devices, can be constructed to implement one or more ofthe embodiments described herein. Applications that may include thevarious embodiments set forth herein may broadly include a variety ofelectronic and computer systems. Accordingly, the present applicationmay encompass software, firmware, and hardware implementations, orcombinations thereof.

The present specification describes components and functions that may beimplemented in particular embodiments with reference to particularstandards and protocols, the disclosure is not limited to such standardsand protocols. Such standards are periodically superseded by faster ormore efficient equivalents having essentially the same functions.Accordingly, replacement standards and protocols having the same orsimilar functions are considered equivalents thereof.

The illustrations of the embodiments described herein are intended toprovide a general understanding of the various embodiments. Theillustrations are not intended to serve as a complete description of allof the elements and features of apparatus and systems that utilize thestructures or methods described herein. Many other embodiments may beapparent to those of skill in the art upon reviewing the disclosure.Other embodiments may be utilized and derived from the disclosure, suchthat structural and logical substitutions and changes may be madewithout departing from the scope of the disclosure. Additionally, theillustrations are merely representational and may not be drawn to scale.Certain proportions within the illustrations may be exaggerated, whileother proportions may be minimized. Accordingly, the disclosure and thefigures are to be regarded as illustrative rather than restrictive.

One or more embodiments of the disclosure may be referred to herein,individually and/or collectively, by the term “disclosure” merely forconvenience and without intending to voluntarily limit the scope of thisapplication to any particular disclosure or inventive concept. Moreover,although specific embodiments have been illustrated and describedherein, it should be appreciated that any subsequent arrangementdesigned to achieve the same or similar purpose may be substituted forthe specific embodiments shown. This disclosure is intended to cover anyand all subsequent adaptations or variations of various embodiments.Combinations of the above embodiments, and other embodiments notspecifically described herein, will be apparent to those of skill in theart upon reviewing the description.

In addition, in the foregoing Detailed Description, various features maybe grouped together or described in a single embodiment for the purposeof streamlining the disclosure. This disclosure is not to be interpretedas reflecting an intention that the claimed embodiments require morefeatures than are expressly recited in each claim. Rather, as thefollowing claims reflect, inventive subject matter may be directed toless than all of the features of any of the disclosed embodiments. Thus,the following claims are incorporated into the Detailed Description,with each claim standing on its own as defining separately claimedsubject matter.

The above disclosed subject matter is to be considered illustrative, andnot restrictive, and the appended claims are intended to cover all suchmodifications, enhancements, and other embodiments which fall within thetrue spirit and scope of the present disclosure. Thus, to the maximumextent allowed by law, the scope of the present disclosure is to bedetermined by the broadest permissible interpretation of the followingclaims and their equivalents, and shall not be restricted or limited bythe foregoing detailed description.

1. A system comprising: a receiver receiving encoded video data; a videodecoder decoding the encoded video data using a plurality of subgroupparameters corresponding to a plurality of subgroups of pixels within agroup; and a controller setting the subgroup parameters for at least oneof the subgroups of pixels in the video decoder, based upon at least oneparameters corresponding to the group, wherein the encoded video datadoes not include the subgroup parameters for the at least one of thesubgroups of pixels but includes the subgroup parameters for at leastone other of the subgroups of pixels. 2.-23. (canceled)